Family Law
- North America > United States > Texas (0.14)
- North America > Canada > Ontario > National Capital Region > Ottawa (0.13)
- North America > Canada > Ontario > Toronto (0.13)
- (44 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Overview (1.00)
- (2 more...)
- Transportation > Passenger (1.00)
- Transportation > Air (1.00)
- Leisure & Entertainment (1.00)
- (25 more...)
- North America > United States > Texas (0.14)
- North America > Canada > Ontario > National Capital Region > Ottawa (0.13)
- North America > Canada > Ontario > Toronto (0.13)
- (44 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Overview (1.00)
- (2 more...)
- Transportation > Passenger (1.00)
- Transportation > Air (1.00)
- Leisure & Entertainment (1.00)
- (24 more...)
Assessing the Reliability of Large Language Models in the Bengali Legal Context: A Comparative Evaluation Using LLM-as-Judge and Legal Experts
Aftahee, Sabik, Farhad, A. F. M., Mallik, Arpita, Dhar, Ratnajit, Karim, Jawadul, Noor, Nahiyan Bin, Solaiman, Ishmam Ahmed
Accessing legal help in Bangladesh is hard. People face high fees, complex legal language, a shortage of lawyers, and millions of unresolved court cases. Generative AI models like OpenAI GPT-4.1 Mini, Gemini 2.0 Flash, Meta Llama 3 70B, and DeepSeek R1 could potentially democratize legal assistance by providing quick and affordable legal advice. In this study, we collected 250 authentic legal questions from the Facebook group "Know Your Rights," where verified legal experts regularly provide authoritative answers. These questions were subsequently submitted to four four advanced AI models and responses were generated using a consistent, standardized prompt. A comprehensive dual evaluation framework was employed, in which a state-of-the-art LLM model served as a judge, assessing each AI-generated response across four critical dimensions: factual accuracy, legal appropriateness, completeness, and clarity. Following this, the same set of questions was evaluated by three licensed Bangladeshi legal professionals according to the same criteria. In addition, automated evaluation metrics, including BLEU scores, were applied to assess response similarity. Our findings reveal a complex landscape where AI models frequently generate high-quality, well-structured legal responses but also produce dangerous misinformation, including fabricated case citations, incorrect legal procedures, and potentially harmful advice. These results underscore the critical need for rigorous expert validation and comprehensive safeguards before AI systems can be safely deployed for legal consultation in Bangladesh.
- Asia > Bangladesh (0.56)
- Asia > India (0.04)
- North America > United States (0.04)
- Europe > United Kingdom (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study > Negative Result (0.46)
- Law > Criminal Law (0.93)
- Law > Family Law (0.93)
- Information Technology > Security & Privacy (0.68)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.73)
REAL: Reading Out Transformer Activations for Precise Localization in Language Model Steering
Zhan, Li-Ming, Liu, Bo, Xie, Chengqiang, Cao, Jiannong, Wu, Xiao-Ming
Inference-time steering aims to alter a large language model's (LLM's) responses without changing its parameters, but a central challenge is identifying the internal modules that most strongly govern the target behavior. Existing approaches often rely on simplistic cues or ad hoc heuristics, leading to suboptimal or unintended effects. We introduce REAL, a framework for identifying behavior-relevant modules (attention heads or layers) in Transformer models. For each module, REAL trains a vector-quantized autoencoder (VQ-AE) on its hidden activations and uses a shared, learnable codebook to partition the latent space into behavior-relevant and behavior-irrelevant subspaces. REAL quantifies a module's behavioral relevance by how well its VQ-AE encodings discriminate behavior-aligned from behavior-violating responses via a binary classification metric; this score guides both module selection and steering strength. We evaluate REAL across eight LLMs from the Llama and Qwen families and nine datasets spanning truthfulness enhancement, open-domain QA under knowledge conflicts, and general alignment tasks. REAL enables more effective inference-time interventions, achieving an average relative improvement of 20% (up to 81.5%) over the ITI method on truthfulness steering. In addition, the modules selected by REAL exhibit strong zero-shot generalization in cross-domain truthfulness-steering scenarios.
- Africa > Middle East > Egypt (0.45)
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- Europe > Germany (0.14)
- (97 more...)
- Research Report > New Finding (1.00)
- Personal > Honors (1.00)
- Media > Music (1.00)
- Media > Film (1.00)
- Leisure & Entertainment > Games (1.00)
- (29 more...)
SafeLawBench: Towards Safe Alignment of Large Language Models
Cao, Chuxue, Zhu, Han, Ji, Jiaming, Sun, Qichao, Zhu, Zhenghao, Wu, Yinyu, Dai, Juntao, Yang, Yaodong, Han, Sirui, Guo, Yike
With the growing prevalence of large language models (LLMs), the safety of LLMs has raised significant concerns. However, there is still a lack of definitive standards for evaluating their safety due to the subjective nature of current safety benchmarks. To address this gap, we conducted the first exploration of LLMs' safety evaluation from a legal perspective by proposing the SafeLawBench benchmark. SafeLawBench categorizes safety risks into three levels based on legal standards, providing a systematic and comprehensive framework for evaluation. It comprises 24,860 multi-choice questions and 1,106 open-domain question-answering (QA) tasks. Our evaluation included 2 closed-source LLMs and 18 open-source LLMs using zero-shot and few-shot prompting, highlighting the safety features of each model. We also evaluated the LLMs' safety-related reasoning stability and refusal behavior. Additionally, we found that a majority voting mechanism can enhance model performance. Notably, even leading SOTA models like Claude-3.5-Sonnet and GPT-4o have not exceeded 80.5% accuracy in multi-choice tasks on SafeLawBench, while the average accuracy of 20 LLMs remains at 68.8\%. We urge the community to prioritize research on the safety of LLMs.
- Asia > China > Hong Kong (0.05)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- (6 more...)
Sex-Fantasy Chatbots Are Leaking a Constant Stream of Explicit Messages
Several AI chatbots designed for fantasy and sexual role-playing conversations are leaking user prompts to the web in almost real time, new research seen by WIRED shows. Some of the leaked data shows people creating conversations detailing child sexual abuse, according to the research. Conversations with generative AI chatbots are near instantaneous--you type a prompt and the AI responds. If the systems are configured improperly, however, this can lead to chats being exposed. In March, researchers at the security firm UpGuard discovered around 400 exposed AI systems while scanning the web looking for misconfigurations.
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.82)
- Law > Family Law (0.58)
- Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (0.43)
Comparative Analysis of Audio Feature Extraction for Real-Time Talking Portrait Synthesis
Salehi, Pegah, Sheshkal, Sajad Amouei, Thambawita, Vajira, Gautam, Sushant, Sabet, Saeed S., Johansen, Dag, Riegler, Michael A., Halvorsen, Pål
The application of AI in education has gained widespread attention for its potential to enhance learning experiences across disciplines, including psychology [1, 2]. In the context of investigative interviewing, especially when questioning suspected child victims, AI offers a promising alternative to traditional training approaches. These conventional methods, often delivered through short workshops, fail to provide the hands-on practice, feedback, and continuous engagement needed for interviewers to master best practices in questioning child victims [3, 4]. Research has shown that while best practices recommend open-ended questions and discourage leading or suggestive queries [5, 6], many interviewers still struggle to implement these techniques effectively during real-world investigations [7]. The adoption of AI-powered child avatars provides a valuable solution, enabling Child Protective Services (CPS) workers to engage in realistic practice sessions without the ethical dilemmas associated with using real children, while simultaneously offering personalized feedback on their performance [8]. Our current system leverages advanced AI techniques within a structured virtual environment to train professionals in investigative interviewing. Specifically, this system integrates the Unity Engine to generate virtual avatars. Despite the potential advantages of our AI-based training system, its effectiveness largely depends on the perceived realism and fidelity of the virtual avatars used in these simulations [9]. Based on our findings, we observed that avatars generated using Generative Adversarial Networks (GANs) demonstrated higher levels of realism compared to those created with the Unity Engine in several key aspects [10].
- Europe > Norway > Eastern Norway > Oslo (0.05)
- Europe > Norway > Northern Norway > Troms > Tromsø (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Speech (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
A Pornhub Chatbot Stopped Millions From Searching for Child Abuse Videos
For the past two years, millions of people searching for child abuse videos on Pornhub's UK website have been interrupted. Each of the 4.4 million times someone has typed in words or phrases linked to abuse, a warning message has blocked the page, saying that kind of content is illegal. And in half the cases, a chatbot has also pointed people to where they can seek help. The warning message and chatbot were deployed by Pornhub as part of a trial program, conducted with two UK-based child protection organizations, to find out whether people could be nudged away from looking for illegal material with small interventions. A new report analyzing the test, shared exclusively with WIRED, says the pop-ups led to a decrease in the number of searches for child sexual abuse material (CSAM) and saw scores of people seek support for their behavior.
- Europe > United Kingdom (0.37)
- Oceania > Australia > Tasmania (0.06)
- Law > Family Law (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (0.99)
Beyond Predictive Algorithms in Child Welfare
Moon, Erina Seh-Young, Saxena, Devansh, Maharaj, Tegan, Guha, Shion
Caseworkers in the child welfare (CW) sector use predictive decision-making algorithms built on risk assessment (RA) data to guide and support CW decisions. Researchers have highlighted that RAs can contain biased signals which flatten CW case complexities and that the algorithms may benefit from incorporating contextually rich case narratives, i.e. - casenotes written by caseworkers. To investigate this hypothesized improvement, we quantitatively deconstructed two commonly used RAs from a United States CW agency. We trained classifier models to compare the predictive validity of RAs with and without casenote narratives and applied computational text analysis on casenotes to highlight topics uncovered in the casenotes. Our study finds that common risk metrics used to assess families and build CWS predictive risk models (PRMs) are unable to predict discharge outcomes for children who are not reunified with their birth parent(s). We also find that although casenotes cannot predict discharge outcomes, they contain contextual case signals. Given the lack of predictive validity of RA scores and casenotes, we propose moving beyond quantitative risk assessments for public sector algorithms and towards using contextual sources of information such as narratives to study public sociotechnical systems.
- North America > Canada > Ontario > Toronto (0.28)
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > North Carolina (0.04)
- (9 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.67)
- Law > Criminal Law (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Health & Medicine > Therapeutic Area (0.93)
- (5 more...)
Researchers found child abuse material in the largest AI image generation dataset
Researchers from the Stanford Internet Observatory say that a dataset used to train AI image generation tools contains at least 1,008 validated instances of child sexual abuse material. The Stanford researchers note that the presence of CSAM in the dataset could allow AI models that were trained on the data to generate new and even realistic instances of CSAM. LAION, the non-profit that created the dataset, told 404 Media that it "has a zero tolerance policy for illegal content and in an abundance of caution, we are temporarily taking down the LAION datasets to ensure they are safe before republishing them." The organization added that, before publishing its datasets in the first place, it created filters to detect and remove illegal content from them. However, 404 points out that LAION leaders have been aware since at least 2021 that there was a possibility of their systems picking up CSAM as they vacuumed up billions of images from the internet.
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Law > Family Law (0.87)
- Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (0.71)